Efficiency of Reproducible Level 1 BLAS
نویسندگان
چکیده
Numerical reproducibility failures appear in massively parallel floating-point computations. One way to guarantee the numerical reproducibility is to extend the IEEE-754 correct rounding to larger computing sequences, as for instance for the BLAS libraries. Is the overcost for numerical reproducibility acceptable in practice? We present solutions and experiments for the level 1 BLAS and we conclude about the efficiency of these reproducible routines.
منابع مشابه
Reproducible, Accurately Rounded and Efficient BLAS
Numerical reproducibility failures rise in parallel computation because floating-point summation is non-associative. Massively parallel and optimized executions dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger op...
متن کاملEvaluating Block Algorithm Variants in LAPACK
The LAPACK software project currently under development is intended to provide a portable linear algebra library for high performance computers. LAPACK will make use of the Level 1, 2, and 3 BLAS to carry out basic operations. A principal focus of this project is to implement blocked versions of a number of algorithms to take advantage of the greater parallelism and improved data locality of th...
متن کاملEfficient Reproducible Floating Point Summation and BLAS
We define reproducibility to mean getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should ideally not change the answer. Many users depend on reproducibility for debugging or correctness [1]. However, dynamic scheduling of parallel computing resources, combined with nonassociativity of floating point additi...
متن کاملPerformance Evaluation of Some Inverse Iteration Algorithms on PowerXCell 8i Processor
In this paper, we compare with the inverse iteration algorithms on PowerXCell 8i processor, which has been known as a heterogeneous environment. When some of all the eigenvalues are close together or there are clusters of eigenvalues, reorthogonalization must be adopted to all the eigenvectors associated with such eigenvalues. Reorthogonalization algorithms need a lot of computational cost. The...
متن کاملThe Implementation of BLAS level 3 on the AP 1000 : Preliminary Report ∗
The Basic Linear Algebra Subprogram (BLAS) library is widely used in many supercomputing applications, and is used to implement more extensive linear algebra subroutine libraries, such as LINPACK and LAPACK. To take advantage of the high degree of parallelism of architectures such as the Fujitsu AP1000, BLAS level 3 routines (matrix-matrix operations) are proposed. This project is concerned wit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014